In computing, aliasing describes a situation in which a data location in memory can be accessed through different symbolic names in the program. Thus, modifying the data through one name implicitly modifies the values associated to all aliased names, which may not be expected by the programmer. As a result, aliasing makes it particularly difficult to understand, analyze and optimize programs. Aliasing analysers intend to make and compute useful information for understanding aliasing in programs.
Contents |
For example, the C programming language does not perform array bounds checking. One can then exploit the implementation of the programming language by the compiler, plus the computer architecture's assembly language conventions, to achieve aliasing effects.
If an array is created on the stack, with a variable laid out in memory directly beside that array, one could index outside that array and then directly change that variable by changing the relevant array element. For example, if we have an int array of size 10 (for this example's sake, calling it vector), next to another int variable (call it i), vector[10] (i.e. the 11th element) would be aliased to i if they are adjacent in memory.
This is possible in some implementations of C because an array is in reality a block of contiguous memory, and array elements are merely offsets off the address of the beginning of that block multiplied by the size of a single element. Since C has no bounds checking, indexing and addressing outside of the array is possible. Note that the aforementioned aliasing behaviour is implementation specific. Some implementations may leave space between arrays and variables on the stack, for instance, to align variables to memory locations that are a multiple of the architecture's native word size. The C standard does not generally specify how data is to be laid out in memory. (ISO/IEC 9899:1999, section 6.2.6.1).
It is not erroneous for a compiler to omit aliasing effects for accesses that fall outside the bounds of an array.
Another variety of aliasing can occur in any language that can refer to one location in memory with more than one name (for example, with pointers). See the C example of the xor swap algorithm that is a function; it assumes the two pointers passed to it are distinct, but if they are in fact equal (or aliases of each other), the function fails. This is a common problem with functions that accept pointer arguments, and their tolerance (or the lack thereof) for aliasing must be carefully documented, particularly for functions that perform complex manipulations on memory areas passed to them.
Controlled aliasing behaviour may be desirable in some cases (that is, aliasing behaviour that is specified, unlike that relevant to memory layout in C). It is common practice in Fortran. The Perl programming language specifies, in some constructs, aliasing behaviour, such as in foreach loops. This allows certain data structures to be modified directly with less code. For example,
my @array = (1, 2, 3); foreach my $element (@array) { # Increment $element, thus automatically # modifying @array, since $element is ''aliased'' # to each of @array's elements in turn. $element++; } print "@array \n";
will print out "2 3 4" as a result. If one wanted to bypass aliasing effects, one could copy the contents of the index variable into another and change the copy.
Optimizers often have to make conservative assumptions about variables in the presence of pointers. For example, a constant propagation process that knows the value of variable x
is 5 would not be able to keep using this information after an assignment to another variable (for example, *y = 10
) because it could be that *y
is an alias of x
. This could be the case after an assignment like y = &x
.
As an effect of the assignment to *y
, the value of x would be changed as well, so propagating the information that x
is 5 to the statements following *y = 10
would be potentially wrong (if *y
is indeed an alias of x
). However, if we have information about pointers, the constant propagation process could make a query like: can x
be an alias of *y
? Then, if the answer is no, x = 5
can be propagated safely.
Another optimization impacted by aliasing is code reordering. If the compiler decides that x
is not an alias of *y
, then code that uses or changes the value of x
can be moved before the assignment *y = 10
, if this would improve scheduling or enable more loop optimizations to be carried out.
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition, see section 6.5, paragraph 7) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", sometime allows for impressive increases in performance,[1] but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[2] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[3] In such cases, when compiled with gcc, the option -fno-strict-aliasing
is invoked to prevent unwanted or invalid optimizations that could produce incorrect code.